Recently, dropout has seen increasing use in deep learning. For deepconvolutional neural networks, dropout is known to work well in fully-connectedlayers. However, its effect in pooling layers is still not clear. This paperdemonstrates that max-pooling dropout is equivalent to randomly pickingactivation based on a multinomial distribution at training time. In light ofthis insight, we advocate employing our proposed probabilistic weightedpooling, instead of commonly used max-pooling, to act as model averaging attest time. Empirical evidence validates the superiority of probabilisticweighted pooling. We also compare max-pooling dropout and stochastic pooling,both of which introduce stochasticity based on multinomial distributions atpooling stage.
展开▼